|
In computer programming, the lexer hack (as opposed to "a lexer hack") describes a common solution to the problems in parsing ANSI C, due to the reference grammar being context-sensitive. In C, classifying a sequence of characters as a variable name or a type name requires contextual information of the phrase structure, which prevents one from having a context-free lexer. == Problem == The problem is that in the following code, the lexical class of A cannot be determined without further contextual information:This code could be multiplication of two variables, in which case A is a variable ; written unambiguously:Alternatively, it could be casting the dereferenced value of B to the type A , in which case A is a typedef-name ; written unambiguously:In more detail, in a compiler, the lexer performs one of the earliest stages of converting the source code to a program. It scans the text to extract meaningful ''tokens'', such as words, numbers, and strings. The parser analyzes sequences of tokens attempting to match them to syntax rules representing language structures, such as loops and variable declarations. A problem occurs here if a single sequence of tokens can ambiguously match more than one syntax rule. This ambiguity can happen in C if the lexer does not distinguish between variable and typedef identifiers.〔(【引用サイトリンク】author=Roskind, James A. )〕 For example, in the C expression: the lexer may find these tokens: # left parenthesis # identifier 'A' # right parenthesis # operator ' *' # identifier 'B' The problem is precisely that the lexical class of ''A'' cannot be determined without further context: the parser can interpret this as variable ''A'' multiplied by ''B'' or as type ''A'' casting the dereferenced value of ''B''. This is known as the "typedef-name: identifier" problem, due to the name of the problematic production rule.〔(【引用サイトリンク】 title=The context sensitivity of C's grammar )〕 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「the lexer hack」の詳細全文を読む スポンサード リンク
|